252 research outputs found

    Genome-wide analysis of alternative splicing in cow: implications in bovine as a model for human diseases

    Get PDF
    Background: Alternative splicing (AS) is a primary mechanism of functional regulation in the human genome, with 60% to 80% of human genes being alternatively spliced. As part of the bovine genome annotation team, we have analysed 4567 bovine AS genes, compared to 16715 human and 16491 mouse AS genes, along with Gene Ontology (GO) analysis. We also analysed the two most important events, cassette exons and intron retention in 94 human disease genes and mapped them to the bovine orthologous genes. Of the 94 human inherited disease genes, a protein domain analysis was carried out for the transcript sequences of 12 human genes that have orthologous genes and have been characterised in cow. Results: Of the 21,755 bovine genes, 4,567 genes (21%) are alternatively spliced, compared to 16,715 (68%) in human and 16,491 (57%) in mouse. Gene-level analysis of the orthologous set suggested that bovine genes show fewer AS events compared to human and mouse genes. A detailed examination of cassette exons across human and cow for 94 human disease genes, suggested that a majority of cassette exons in human were present and constitutive in bovine as opposed to intron retention which exhibited 50% of the exons as present and 50% as absent in cow. We observed that AS plays a major role in disease implications in human through manipulations of essential/functional protein domains. It was also evident that majority of these 12 genes had conservation of all essential domains in their bovine orthologous counterpart, for these human diseases. Conclusion: While alternative splicing has the potential to create many mRNA isoforms from a single gene, in cow the majority of genes generate two to three isoforms, compared to six in human and four in mouse. Our analyses demonstrated that a smaller number of bovine genes show greater transcript diversity. GO definitions for bovine AS genes provided 38% more functional information than currently available in the sequence database. Our protein domain analysis helped us verify the suitability of using bovine as a model for human diseases and also recognize the contribution of AS towards the disease phenotypes.13 page(s

    In silico secretome analysis approach for next generation sequencing transcriptomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Excretory/secretory proteins (ESPs) play a major role in parasitic infection as they are present at the host-parasite interface and regulate host immune system. In case of parasitic helminths, transcriptomics has been used extensively to understand the molecular basis of parasitism and for developing novel therapeutic strategies against parasitic infections. However, none of transcriptomic studies have extensively covered ES protein prediction for identifying novel therapeutic targets, especially as parasites adopt non-classical secretion pathways.</p> <p>Results</p> <p>We developed a semi-automated computational approach for prediction and annotation of ES proteins using transcriptomic data from next generation sequencing platforms. For the prediction of non-classically secreted proteins, we have used an improved computational strategy, together with homology matching to a dataset of experimentally determined parasitic helminth ES proteins. We applied this protocol to analyse 454 short reads of parasitic nematode, <it>Strongyloides ratti</it>. From 296231 reads, we derived 28901 contigs, which were translated into 20877 proteins. Based on our improved ES protein prediction pipeline, we identified 2572 ES proteins, of which 407 (1.9%) proteins have classical N-terminal signal peptides, 923 (4.4%) were computationally identified as non-classically secreted while 1516 (7.26%) were identified by homology to experimentally identified parasitic helminth ES proteins. Out of 2572 ES proteins, 2310 (89.8%) ES proteins had homologues in the free-living nematode <it>Caenorhabditis elegans</it> and 2220 (86.3%) in parasitic nematodes. We could functionally annotate 1591 (61.8%) ES proteins with protein families and domains and establish pathway associations for 691 (26.8%) proteins. In addition, we have identified 19 representative ES proteins, which have no homologues in the host organism but homologous to lethal RNAi phenotypes in <it>C. elegans</it>, as potential therapeutic targets.</p> <p>Conclusion</p> <p>We report a comprehensive approach using freely available computational tools for the secretome analysis of NGS data. This approach has been applied to <it>S. ratti</it> 454 transcriptomic data for <it>in silico</it> excretory/secretory proteins prediction and analysis, providing a foundation for developing new therapeutic solutions for parasitic infections.</p

    Bioinformatics Education—Perspectives and Challenges

    Get PDF
    This article discusses the evolution of curriculum, instructional methodologies and initiatives supporting the dissemination of bioinformatics. Building on the early applications of informatics to the field of biology, bioinformatics research entails input from the diverse disciplines of mathematics and statistics, physics and chemistry and medicine and pharmacology. Training in bioinformatics remains the oldest and most important rapid introduction approach to learning bioinformatics skills.2 page(s

    Comprehensive splicing graph analysis of alternative splicing patterns in chicken, compared to human and mouse

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alternative transcript diversity manifests itself as a prime cause of complexity in higher eukaryotes. Recently, transcript diversity studies have suggested that 60–80% of human genes are alternatively spliced. We have used a splicing pattern approach for the bioinformatics analysis of Alternative Splicing (AS) in chicken, human and mouse. Exons involved in splicing are subdivided into distinct and variant exons, based on the prevalence of the exons across the transcripts. Four possible permutations of these two different groups of exons were categorised as class I (distinct-variant), class II (distinct-variant), class III (variant-distinct) and class IV (variant-variant). This classification quantifies the variation in transcript diversity in the three species.</p> <p>Results</p> <p>In all, 3901 chicken AS genes have been compared with 16,715 human and 16,491 mouse AS genes, with 23% of chicken genes being alternatively spliced, compared to 68% in humans and 57% in mice. To minimize any gene structure bias in the input data, comparative genome analysis has been carried out on the orthologous subset of AS genes for the three species. Gene-level analysis suggested that chicken genes show fewer AS events compared to human and mouse. An event-level analysis showed that the percentage of AS events in chicken is similar to that of human, which implies that a smaller number of chicken genes show greater transcript diversity. Overall, chicken genes were found to have fewer transcripts per gene and shorter introns than human and mouse genes.</p> <p>Conclusion</p> <p>In chicken, the majority of genes generate only two or three isoforms, compared to almost eight in human and six in mouse. We observed that intron definition is expressed strongly when compared to exon definition for chicken genome, based on 3% intron retention in chicken, compared to 2% in human and mouse. Splicing patterns with variant exons account for 33% of AS chicken orthologous genes compared to 24% in human and 27% in mouse, providing a novel measure to describe the species-wise complexity due to alternative transcript diversity.</p

    pDOCK: a new technique for rapid and accurate docking of peptide ligands to Major Histocompatibility Complexes

    Get PDF
    Background: Identification of antigenic peptide epitopes is an essential prerequisite in T cell-based molecular vaccine design. Computational (sequence-based and structure-based) methods are inexpensive and efficient compared to experimental approaches in screening numerous peptides against their cognate MHC alleles. In structure-based protocols, suited to alleles with limited epitope data, the first step is to identify high-binding peptides using docking techniques, which need improvement in speed and efficiency to be useful in large-scale screening studies. We present pDOCK: a new computational technique for rapid and accurate docking of flexible peptides to MHC receptors and primarily apply it on a non-redundant dataset of 186 pMHC (MHC-I and MHC-II) complexes with X-ray crystal structures. Results: We have compared our docked structures with experimental crystallographic structures for the immunologically relevant nonameric core of the bound peptide for MHC-I and MHC-II complexes. Primary testing for re-docking of peptides into their respective MHC grooves generated 159 out of 186 peptides with Ca RMSD of less than 1.00 Å, with a mean of 0.56 Å. Amongst the 25 peptides used for single and variant template docking, the Ca RMSD values were below 1.00 Å for 23 peptides. Compared to our earlier docking methodology, pDOCK shows upto 2.5 fold improvement in the accuracy and is ~60% faster. Results of validation against previously published studies represent a seven-fold increase in pDOCK accuracy. Conclusions: The limitations of our previous methodology have been addressed in the new docking protocol making it a rapid and accurate method to evaluate pMHC binding. pDOCK is a generic method and although benchmarks against experimental structures, it can be applied to alleles with no structural data using sequence information. Our outcomes establish the efficacy of our procedure to predict highly accurate peptide structures permitting conformational sampling of the peptide in MHC binding groove. Our results also support the applicability of pDOCK for in silico identification of promiscuous peptide epitopes that are relevant to higher proportions of human population with greater propensity to activate T cells making them key targets for the design of vaccines and immunotherapies.16 page(s

    In silico characterization of immunogenic epitopes presented by HLA-Cw*0401

    Get PDF
    © 2007 Tong et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

    A comparative structural bioinformatics analysis of inherited mutations in β-D-Mannosidase across multiple species reveals a genotype-phenotype correlation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lysosomal β-D-mannosidase is a glycosyl hydrolase that breaks down the glycosidic bonds at the non-reducing end of N-linked glycoproteins. Hence, it is a crucial enzyme in polysaccharide degradation pathway. Mutations in the <it>MANBA</it> gene that codes for lysosomal β-mannosidase, result in improper coding and malfunctioning of protein, leading to β-mannosidosis. Studying the location of mutations on the enzyme structure is a rational approach in order to understand the functional consequences of these mutations. Accordingly, the pathology and clinical manifestations of the disease could be correlated to the genotypic modifications.</p> <p>Results</p> <p>The wild-type and inherited mutations of β-mannosidase were studied across four different species, human, cow, goat and mouse employing a previously demonstrated comprehensive homology modeling and mutational mapping technique, which reveals a correlation between the variation of genotype and the severity of phenotype in β-mannosidosis. X-ray crystallographic structure of β-mannosidase from Bacteroides thetaiotaomicron was used as template for 3D structural modeling of the wild-type enzymes containing all the associated ligands. These wild-type models subsequently served as templates for building mutational structures. Truncations account for approximately 70% of the mutational cases. In general, the proximity of mutations to the active site determines the severity of phenotypic expressions. Mapping mutations to the <it>MANBA</it> gene sequence has identified five mutational hot-spots.</p> <p>Conclusion</p> <p>Although restrained by a limited dataset, our comprehensive study suggests a genotype-phenotype correlation in β-mannosidosis. A predictive approach for detecting likely β-mannosidosis is also demonstrated where we have extrapolated observed mutations from one species to homologous positions in other organisms based on the proximity of the mutations to the enzyme active site and their co-location from different organisms. Apart from aiding the detection of mutational hotspots in the gene, where novel mutations could be disease-implicated, this approach also provides a way to predict new disease mutations. Higher expression of the exoglycosidase chitobiase is said to play a vital role in determining disease phenotypes in human and mouse. A bigger dataset of inherited mutations as well as a parallel study of β-mannosidase and chitobiase activities in prospective patients would be interesting to better understand the underlying reasons for β-mannosidosis.</p

    SPdb – a signal peptide database

    Get PDF
    BACKGROUND: The signal peptide plays an important role in protein targeting and protein translocation in both prokaryotic and eukaryotic cells. This transient, short peptide sequence functions like a postal address on an envelope by targeting proteins for secretion or for transfer to specific organelles for further processing. Understanding how signal peptides function is crucial in predicting where proteins are translocated. To support this understanding, we present SPdb signal peptide database , a repository of experimentally determined and computationally predicted signal peptides. RESULTS: SPdb integrates information from two sources (a) Swiss-Prot protein sequence database which is now part of UniProt and (b) EMBL nucleotide sequence database. The database update is semi-automated with human checking and verification of the data to ensure the correctness of the data stored. The latest release SPdb release 3.2 contains 18,146 entries of which 2,584 entries are experimentally verified signal sequences; the remaining 15,562 entries are either signal sequences that fail to meet our filtering criteria or entries that contain unverified signal sequences. CONCLUSION: SPdb is a manually curated database constructed to support the understanding and analysis of signal peptides. SPdb tracks the major updates of the two underlying primary databases thereby ensuring that its information remains up-to-date

    Modeling Escherichia coli signal peptidase complex with bound substrate: determinants in the mature peptide influencing signal peptide cleavage

    Get PDF
    Background: Type I signal peptidases (SPases) are essential membrane-bound serine proteases responsible for the cleavage of signal peptides from proteins that are translocated across biological membranes. The crystal structure of SPase in complex with signal peptide has not been solved and their substrate-binding site and binding specificities remain poorly understood. We report here a structure-based model for Escherichia coli DsbA 13–25 in complex with its endogenous type I SPase. Results: The bound structure of DsbA 13–25 in complex with its endogenous type I SPase reported here reveals the existence of an extended conformation of the precursor protein with a pronounced backbone twist between positions P3 and P1'. Residues 13–25 of DsbA occupy, and thereby define 13 subsites, S7 to S6', within the SPase substrate-binding site. The newly defined subsites, S1' to S6' play critical roles in the substrate specificities of E. coli SPase. Our results are in accord with available experimental data. Conclusion: Collectively, the results of this study provide interesting new insights into the binding conformation of signal peptides and the substrate-binding site of E. coli SPase. This is the first report on the modeling of a precursor protein into the entire SPase binding site. Together with the conserved precursor protein binding conformation, the existing and newly identified substrate binding sites readily explain SPase cleavage fidelity, consistent with existing biochemical results and solution structures of inhibitors in complex with E. coli SPase. Our data suggests that both signal and mature moiety sequences play important roles and should be considered in the development of predictive tools.7 page(s

    SVM-based prediction of caspase substrate cleavage sites

    Get PDF
    BACKGROUND: Caspases belong to a class of cysteine proteases which function as critical effectors in apoptosis and inflammation by cleaving substrates immediately after unique sites. Prediction of such cleavage sites will complement structural and functional studies on substrates cleavage as well as discovery of new substrates. Recently, different computational methods have been developed to predict the cleavage sites of caspase substrates with varying degrees of success. As the support vector machines (SVM) algorithm has been shown to be useful in several biological classification problems, we have implemented an SVM-based method to investigate its applicability to this domain. RESULTS: A set of unique caspase substrates cleavage sites were obtained from literature and used for evaluating the SVM method. Datasets containing (i) the tetrapeptide cleavage sites, (ii) the tetrapeptide cleavage sites, augmented by two adjacent residues, P(1)' and P(2)' amino acids and (iii) the tetrapeptide cleavage sites with ten additional upstream and downstream flanking sequences (where available) were tested. The SVM method achieved an accuracy ranging from 81.25% to 97.92% on independent test sets. The SVM method successfully predicted the cleavage of a novel caspase substrate and its mutants. CONCLUSION: This study presents an SVM approach for predicting caspase substrate cleavage sites based on the cleavage sites and the downstream and upstream flanking sequences. The method shows an improvement over existing methods and may be useful for predicting hitherto undiscovered cleavage sites
    corecore